Search CORE

246 research outputs found

Assessing the Prosody of Non-Native Speakers of English: Measures and Feature Sets

Author: Batliner A
Coutinho E
Hantke S
Hönig F
Nöth E
Schuller B
Zhang Y
Publication venue: European Language Resources Association
Publication date: 01/02/2016
Field of study

In this paper, we describe a new database with audio recordings of non-native (L2) speakers of English, and the perceptual evaluation experiment conducted with native English speakers for assessing the prosody of each recording. These annotations are then used to compute the gold standard using different methods, and a series of regression experiments is conducted to evaluate their impact on the performance of a regression model predicting the degree of Abstract naturalness of L2 speech. Further, we compare the relevance of different feature groups modelling prosody in general (without speech tempo), speech rate and pauses modelling speech tempo (fluency), voice quality, and a variety of spectral features. We also discuss the impact of various fusion strategies on performance.Overall, our results demonstrate that the prosody of non-native speakers of English as L2 can be reliably assessed using supra- segmental audio features; prosodic features seem to be the most important ones

University of Liverpool Repository

Spiral - Imperial College Digital Repository

I hear you eat and speak: automatic recognition of eating condition and food type, use-cases, and impact on ASR performance

Author: Batliner A
Hantke S
Kurle R
Mousa AELD
Ringeval F
Schuller B
Weninger F
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 14/04/2016
Field of study

We propose a new recognition task in the area of computational paralinguistics: automatic recognition of eating conditions in speech, i. e., whether people are eating while speaking, and what they are eating. To this end, we introduce the audio-visual iHEARu-EAT database featuring 1.6 k utterances of 30 subjects (mean age: 26.1 years, standard deviation: 2.66 years, gender balanced, German speakers), six types of food (Apple, Nectarine, Banana, Haribo Smurfs, Biscuit, and Crisps), and read as well as spontaneous speech, which is made publicly available for research purposes. We start with demonstrating that for automatic speech recognition (ASR), it pays off to know whether speakers are eating or not. We also propose automatic classification both by brute-forcing of low-level acoustic features as well as higher-level features related to intelligibility, obtained from an Automatic Speech Recogniser. Prediction of the eating condition was performed with a Support Vector Machine (SVM) classifier employed in a leave-one-speaker-out evaluation framework. Results show that the binary prediction of eating condition (i. e., eating or not eating) can be easily solved independently of the speaking condition; the obtained average recalls are all above 90%. Low-level acoustic features provide the best performance on spontaneous speech, which reaches up to 62.3% average recall for multi-way classification of the eating condition, i. e., discriminating the six types of food, as well as not eating. The early fusion of features related to intelligibility with the brute-forced acoustic feature set improves the performance on read speech, reaching a 66.4% average recall for the multi-way classification task. Analysing features and classifier errors leads to a suitable ordinal scale for eating conditions, on which automatic regression can be performed with up to 56.2% determination coefficient

Directory of Open Access Journals

Spiral - Imperial College Digital Repository

1 The Prosodic Marking of Phrase Boundaries: Expectations and Results

Author: A. Batliner
A. Kießling
E. Nöth
H. Niemann
R. Kompe
U. Kilian
Publication venue
Publication date
Field of study

ABSTRACT Using sentence templates and a stochastic context-free grammar a large corpus (10,000 sentences) has been created, where prosodic phrase boundaries are labeled in the sentences automatically during sentence generation. With perception experiments on a subset of 500 utterances we verified that 92 % of the automatically marked boundaries were perceived as prosodically marked. In initial automatic classification experiments for three levels of boundaries recognition rates up to 81 % could be achieved. 1.1 Introduction and Material A successful automatic detection of phrase boundaries can be of great help for parsing a word hypotheses graph in an automatic speech understanding (ASU) system. Our recognition paradigm lies within the statistical approach; we therefore need a large training database, i.e. a corpus with reference labels for prosodically marked phrase boundaries. In this paper we wil

CiteSeerX

Automatic classification of prosodically marked phrase boundaries in German

Author: Batliner A.
Kießling A.
Kilian U.
Kompe Ralf
Niemann H.
Nöth E.
Regel-Brietzmann P.
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/1993
Field of study

A large corpus has been created automatically and read by speakers. Phrase boundaries were labeled in the sentences automatically during sentence generation. Perception experiments on a subset of 500 utterances showed a high agreement between the automatically generated boundary markers and the ones perceived by listeners. Gaussian distribution and polynomial classifiers were trained on a set of prosodic features computed from the speech signal using the automatically generated boundary markers. Comparing the classification results with the judgments of the listeners yielded in a recognition rate of 87%. A combination with stochastic language models improved the recognition rate to 90%. We found that the pause and the durational features are most important for the classification, but that the influence of F0 is not neglectable

Universaar

Acronym

PEAKS – A system for the automatic evaluation of voice and speech disorders

Author: A. Batliner
A. Maier
Batliner
Batliner
Batliner
Batliner
Bellandese
Bodin
Bressmann
Brown
Brown
Cohen
Cohen
Courrieu
E. Nöth
Enderby
F. Rosanowski
Furia
Gales
Harding
Haughey
Henningsson
Keuning
Knuuttila
Kuttner
M. Schuster
Mahanna
Markkanen-Leppanen
Millard
Moore
Mády
Paal
Panchal
Pauloski
Paulowski
Penrose
Press
Riedhammer
Robbins
Robbins
Rosanowski
Ruben
Schutte
Schönweiler
Schönweiler
Seikaly
Su
T. Haderlein
Terai
U. Eysholdt
Wantia
Witten
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Prosodic scoring of word hypotheses graphs

Author: Batliner Anton
Kießling Andreas
Kompe Ralf
Niemann Heinrich
Nöth Elmar
Schukat-Talamazzini Ernst Günter
Zottmann A.
Publication venue: Sonstige Einrichtungen. DFKI Deutsches Forschungszentrum für Künstliche Intelligenz
Publication date: 01/01/1995
Field of study

Prosodic boundary detection is important to disambiguate parsing, especially in spontaneous speech, where elliptic sentences occur frequently. Word graphs are an efficient interface between word recognition and parser. Prosodic classification of word chains has been published earlier. The adjustments necessary for applying these classification techniques to word graphs are discussed in this paper. When classifying a word hypothesis a set of context words has to be determined appropriately. A method has been developed to use stochastic language models for prosodic classification. This as well has been adopted for the use on word graphs. We also improved the set of acoustic-prosodic features with which the recognition errors were reduced by about 60% on the read speech we were working on previously, now achieving 10% error rate for 3 boundary classes and 3% for 2 accent classes. Moving to spontaneous speech the recognition error increases significantly (e.g. 16% for a 2-class boundary task). We show that even on word graphs the combination of language models which model a larger context with acoustic-prosodic classifiers reduces the recognition error by up to 50 %

CiteSeerX

Universaar

Acronym

René Drouin : le spectateur des arts

Author: Batliner Anton
Burger S.
Johne B.
Kießling A.
Publication venue: Critique d’art
Publication date: 14/12/2012
Field of study

René Drouin, galeriste, éditeur d’art : “une présence singulière”, “un poète”, “un libre penseur”. Tels sont les termes qui, d’un texte à l’autre, caractérisent cette personnalité du monde de l’art. En ayant quasiment traversé le XXe siècle, il avoue avoir fait « les choses avec liberté, sans être un homme de parti ». L’exposition et le catalogue du musée rendent compte des choix de René Drouin. Celui qui remarque très vite les artistes français majeurs de la seconde moitié du siècle est le p..

OPUS Augsburg

OpenEdition

The ACII 2022 Affective Vocal Bursts Workshop & Competition: understanding a critically understudied modality of emotional expression

Author: Baird Alice
Batliner Anton
Brooks Jeffrey A.
Cowen Alan
Gregory Christopher B.
Keltner Dacher
Schuller Björn
Tzirakis Panagiotis
Publication venue
Publication date: 15/11/2022
Field of study

The ACII Affective Vocal Bursts Workshop & Competition is focused on understanding multiple affective dimensions of vocal bursts: laughs, gasps, cries, screams, and many other non-linguistic vocalizations central to the expression of emotion and to human communication more generally. This year's competition comprises four tracks using a large-scale and in-the-wild dataset of 59,299 vocalizations from 1,702 speakers. The first, the A-VB-High task, requires competition participants to perform a multi-label regression on a novel model for emotion, utilizing ten classes of richly annotated emotional expression intensities, including; Awe, Fear, and Surprise. The second, the A-VB-Two task, utilizes the more conventional 2-dimensional model for emotion, arousal, and valence. The third, the A-VB-Culture task, requires participants to explore the cultural aspects of the dataset, training native-country dependent models. Finally, for the fourth task, A-VB-Type, participants should recognize the type of vocal burst (e.g., laughter, cry, grunt) as an 8-class classification. This paper describes the four tracks and baseline systems, which use state-of-the-art machine learning methods. The baseline performance for each track is obtained by utilizing an end-to-end deep learning model and is as follows: for A-VB-High, a mean (over the 10-dimensions) Concordance Correlation Coefficient (CCC) of 0.5687 CCC is obtained; for A-VB-Two, a mean (over the 2-dimensions) CCC of 0.5084 is obtained; for A-VB-Culture, a mean CCC from the four cultures of 0.4401 is obtained; and for A-VB-Type, the baseline Unweighted Average Recall (UAR) from the 8-classes is 0.4172 UAR

OPUS Augsburg

"Roger", "Sorry", "I'm still listening" : dialog guiding signals in information retrieval dialogs

Author: Batliner A.
Kiessling A.
Kompe R.
Niemann H.
Nöth E.
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/1994
Field of study

During any kind of information retrieval dialog, the repetition of parts of information just given by the dialog partner can often be observed. As these repetitions are usually elliptic, the intonation is very important for determining the speakers intention. In this paper prototypically the times of day repeated by the customer in train table inquiry dialogs are investigated. A scheme is developed for the officers reactions depending on the intonation of these repetitions; it has been integrated into our speech understanding and dialog system EVAR (cf. [6]). Gaussian classifiers were trained for distinguishing the dialog guiding signals confirmation, question and feedback; recognition rates of up to 87.5% were obtained

Universaar

Acronym